AITopics | phonetic feature

Collaborating Authors

phonetic feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification

de Zuazo, Xabier, Saratxaga, Ibon, Navas, Eva

arXiv.org Artificial IntelligenceDec-2-2025

For Speech Detection, a MEG-oriented SpecAugment provided a first exploration of MEG-specific augmentation. For Phoneme Classification, we used inverse-square-root class weighting and a dynamic grouping loader to handle 100-sample averaged examples. In addition, a simple instance-level normalization proved critical to mitigate distribution shifts on the holdout split. Using the official Standard track splits and F1-macro for model selection, our best systems achieved 88.9% (Speech) and 65.8% (Phoneme) on the leaderboard, surpassing the competition baselines and ranking within the top-10 in both tasks.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.01443

Country: Europe > Spain (0.28)

Genre: Research Report > Experimental Study (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus

Scott, Michael, Liang, Siyu, Wassink, Alicia, Levow, Gina-Anne

arXiv.org Artificial IntelligenceOct-28-2025

This paper presents a systematic evaluation of racial bias in four major commercial automatic speech recognition (ASR) systems using the Pacific Northwest English (PNWE) corpus. We analyze transcription accuracy across speakers from four ethnic backgrounds (African American, Caucasian American, ChicanX, and Yakama) and examine how sociophonetic variation contributes to differential system performance. We introduce a heuristically-determined Phonetic Error Rate (PER) metric that links recognition errors to specific linguistically motivated variables derived from sociophonetic annotation. Our analysis of eleven sociophonetic features reveals that vowel quality variation, particularly resistance to the low-back merger and pre-nasal merger patterns, is systematically associated with differential error rates across ethnic groups, with the most pronounced effects for African American speakers across all evaluated systems. These findings demonstrate that acoustic modeling of dialectal phonetic variation, rather than lexical or syntactic factors, remains a primary source of bias in commercial ASR systems. The study establishes the PNWE corpus as a valuable resource for bias evaluation in speech technologies and provides actionable guidance for improving ASR performance through targeted representation of sociophonetic diversity in training data.

american speaker, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.22495

Country:

North America > United States (0.93)
Europe (0.68)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.58)

Add feedback

Picturized and Recited with Dialects: A Multimodal Chinese Representation Framework for Sentiment Analysis of Classical Chinese Poetry

Du, Xiaocong, Pei, Haoyu, Zhang, Haipeng

arXiv.org Artificial IntelligenceMay-20-2025

Classical Chinese poetry is a vital and enduring part of Chinese literature, conveying profound emotional resonance. Existing studies analyze sentiment based on textual meanings, overlooking the unique rhythmic and visual features inherent in poetry, especially since it is often recited and accompanied by Chinese paintings. In this work, we propose a dialect-enhanced multimodal framework for classical Chinese poetry sentiment analysis. We extract sentence-level audio features from the poetry and incorporate audio from multiple dialects, which may retain regional ancient Chinese phonetic features, enriching the phonetic representation. Additionally, we generate sentence-level visual features, and the multimodal features are fused with textual features enhanced by LLM translation through multimodal contrastive representation learning. Our framework outperforms state-of-the-art methods on two public datasets, achieving at least 2.51% improvement in accuracy and 1.63% in macro F1. We open-source the code to facilitate research in this area and provide insights for general multimodal Chinese representation.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.1321

Country:

Asia > China > Guangdong Province (0.14)
Asia > China > Beijing > Beijing (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Shandong Province (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.57)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.51)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.51)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)

Add feedback

Use of Multi-Layered Networks for Coding Speech with Phonetic Features

Neural Information Processing SystemsApr-6-2023, 19:57:34 GMT

Preliminary results on speaker-independant speech recognition are reported. A method that combines expertise on neural networks with expertise on speech recognition is used to build the recognition systems. For transient sounds, event(cid:173) driven property extractors with variable resolution in the time and frequency domains are used. For sonorant speech, a model of the human auditory system is preferred to FFT as a front-end module.

coding speech, multi-layered network, phonetic feature, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion

Deng, Jiangyi, Chen, Yanjiao, Zhong, Yinan, Miao, Qianhao, Gong, Xueluan, Xu, Wenyuan

arXiv.org Artificial IntelligenceFeb-23-2023

Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling the features of the source speaker from a converted audio is challenging since the voice conversion operation intends to disentangle the original features and infuse the features of the target speaker. To fulfill our goal, we develop Revelio, a representation learning model, which learns to effectively extract the voiceprint of the source speaker from converted audio samples. We equip Revelio with a carefully-designed differential rectification algorithm to eliminate the influence of the target speaker by removing the representation component that is parallel to the voiceprint of the target speaker. We have conducted extensive experiments to evaluate the capability of Revelio in restoring voiceprint from audios converted by VQVC, VQVC+, AGAIN, and BNE. The experiments verify that Revelio is able to rebuild voiceprints that can be traced to the source speaker by speaker verification and identification systems. Revelio also exhibits robust performance under inter-gender conversion, unseen languages, and telephony networks.

artificial intelligence, machine learning, voiceprint, (22 more...)

arXiv.org Artificial Intelligence

2302.12434

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America (0.04)
Europe (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.71)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)

Add feedback

Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach

Wang, Ran, Wang, Yao, Flinker, Adeen

arXiv.org Machine LearningNov-7-2018

Abstract--The superior temporal gyrus (STG) region of cortex critically contributes to speech recognition. In this work, we show that a proposed deep network inspired by WaveNet, trained with limited available data, is able to reconstruct speech stimuli from STG intracranial recordings. We further investigate the impulse response of the fitted model for each recording electrode and observe phoneme level temporospectral tuning properties in some recorded area. This discovery is consistent with previous studies implicating the posterior STG (pSTG) in a phonetic representation of speech and provides detailed acoustic features that certain electrode sites possibly extract during speech recognition. Research studies on the superior temporal gyrus (STG) cortex area have shown that this area plays an important role in words and sentence recognition on a phonetic and prelexical stage [1]-[9].

artificial intelligence, machine learning, spectrogram, (18 more...)

arXiv.org Machine Learning

1811.02694

Country: North America > United States > New York (0.29)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Poetic Sound Similarity Vectors Using Phonetic Features

Parrish, Allison (New York University)

AAAI ConferencesOct-1-2017

A procedure that uses phonetic transcriptions of words to produce a continuous vector-space model of phonetic sound similarity is presented. The vector dimensions of words in the model are calculated using interleaved phonetic feature bigrams, a novel method that captures similarities in sound that are difficult to model with orthographic or phonemic information alone. Measurements of similarity between items in the resulting vector space are shown to perform well on established tests for predicting phonetic similarity. Additionally, a number of applications of vector arithmetic and nearest-neighbor search are presented, demonstrating potential uses of the vector space in experimental poetry and procedural content generation.

machine learning, natural language, similarity vector, (2 more...)

AAAI Conferences

Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

Improving Spoken Dialogue Understanding Using Phonetic Mixture Models

Wang, William Yang (Columbia University) | Artstein, Ron (USC Institute for Creative Technologies) | Leuski, Anton (USC Institute for Creative Technologies) | Traum, David (USC Institute for Creative Technologies)

AAAI ConferencesMay-18-2011

Augmenting word tokens with a phonetic representation, derived from a dictionary, improves the performance of a Natural Language Understanding component that interprets speech recognizer output: we observed a 5% to 7% reduction in errors across a wide range of response return rates. The best performance comes from mixture models incorporating both word and phone features. Since the phonetic representation is derived from a dictionary, the method can be applied easily without the need for integration with a specific speech recognizer. The method has similarities with autonomous (or bottom-up) psychological models of lexical access, where contextual information is not integrated at the stage of auditory perception but rather later.

language model, tokenizer, utterance, (15 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry: Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)

Add feedback